oolong

The goal of oolong [1] is to generate and administrate validation tests easily for typical automated content analysis tools such as topic models and dictionary-based tools.

Installation

You can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("chainsawriot/oolong")

Example

Word intrusion test

newsgroup_stm is an example topic model trained with the data newsgroup5 using the stm package. Currently, this package supports structural topic models / correlated topic models from stm and Wrap LDA models from text2vec.

library(oolong)
library(stm)
#> stm v1.3.5 successfully loaded. See ?stm for help. 
#>  Papers, resources, and other materials at structuraltopicmodel.com
newsgroup_stm
#> A topic model with 10 topics, 4182 documents and a 8920 word dictionary.

To create an oolong test, use the function create_oolong_test.

oolong_test <- create_oolong(newsgroup_stm)
oolong_test
#> An oolong test object with k = 10, 0 coded. (0%  precision)
#>  Use the method $do_word_intrusion_test() to start word intrusion test.

As instructed, use the method $do_word_intrusion_test() to start coding. If you are running this in RStudio, you should see a test screen similar to this:

After the coding, you can look at the model precision by printing the oolong test.

oolong_test

Topic intrusion test

For example, newsgroup_stm was generated with the corpus newsgroup5$text

library(tibble)
newsgroup5
#> # A tibble: 4,182 x 3
#>    text                                                  title            ntoken
#>    <chr>                                                 <chr>             <int>
#>  1 "\nI agree.  So why is Cylink the only (and expensiv… sci.crypt           302
#>  2 "[This is a co-authored report from two of us who we… talk.politics.g…   2900
#>  3 "\njubilee, Pope >Leo the 12th had a medallion cast … soc.religion.ch…    463
#>  4 "\n\nAnd the 'Turkish Karabag' is next. As for 'Cypr… talk.politics.m…    929
#>  5 "The subject says what I would like to do, here are … comp.sys.mac.ha…    218
#>  6 "\n\n\n\nAre you related to 'Arromdian' of ASALA/SDP… talk.politics.m…    920
#>  7 "\n#Rick Anderson replied to my letter with...\n#\n#… talk.religion.m…   1760
#>  8 "                                      ^^^^^^\nNo ar… rec.sport.baseb…    302
#>  9 "\nWhy thanks for your reply to my post.  By the way… talk.politics.g…    158
#> 10 " >When some types of client windows are displayed, … comp.windows.x      158
#> # … with 4,172 more rows

Create the oolong test object with the corpus used for training the topic model will generate topic intrusion test cases.

oolong_test <- create_oolong(newsgroup_stm, newsgroup5$text)
oolong_test
#> An oolong test object with k = 10, 0 coded. (0%  precision)
#>  Use the method $do_word_intrusion_test() to start word intrusion test.
#> With 15 cases of topic intrusion test. 0 coded. Use the method $do_topic_intrusion_test() to start topic intrusion test.

[1] /ˈuːlʊŋ/ 烏龍, literally means "Dark Dragon", is a semi-oxidized tea from Asia. It is very popular in Taiwan, Japan and Hong Kong